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A key component of online and blended learning content, 
open educational resources, (OER) are heralded in a global 
movement toward high-quality, affordable, accessible, and 
personalized education. However, stakeholders have ex¬ 
pressed concern about scaling OER use due to a lack of means 
to ensure a fit between learner, resource, and task. Usage data, 
or “paradata,” such as reviews, ratings, views, downloads, fa¬ 
vorites, and shares, may yield insight into the fit. We exam¬ 
ined paradata from National Science Digital Library (NSDL), 
the largest extant accessible coipus, for the extent to which 
K-12 science, technology, engineering, and mathematics 
(STEM) resource fit can be determined from user- and system¬ 
generated data. We conducted sentiment analyses of user re¬ 
views and correlations between the sentiment scores and data 
elements. Some relationships between NSDL paradata ele¬ 
ments suggested aspects of resource fit. Despite prior research 
indicating that user reviews tended to be strongly positive or 
strongly negative, the results of this study indicated that edu¬ 
cators left feedback that contained a blend of sentiments and 
that users usually downloaded resources they viewed. The re¬ 
sults of this study suggest that while it is unlikely that educa¬ 
tor feedback can currently be used to assess resource quality, 
with larger and more robust usage data sets, this area is a fer¬ 
tile area for further research into nuanced sentiment. We con¬ 
clude with observed data trends and further research directions 
to inform online learning. 
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USAGE DATA AS INDICATORS OF OER UTILITY 

Horn and Stake (2014) noted that “one of the most significant ways 
that online learning has improved is by leaning more heavily on in-per- 
son, bricks-and-mortar experiences to provide support and scaffolding for 
students learning online” (p.32). This shift, driven largely by the need for 
instructors to personalize learning experiences more precisely to learners’ 
needs, has been accompanied by an increased need for a wealth of resources 
that address varied interests, abilities, and learning paths (U.S. Department 
of Education, 2017b). Efforts to promote learning personalization through 
blended and online learning, common standards, competency-based educa¬ 
tion, and K-12 digital textbooks are rapidly increasing open educational re¬ 
source (OER) demand (Ash, 2012; Mickey & Meaney, 2013). The vision 
for these efforts is for educators to curate and create high-quality openly li¬ 
censed educational resources in order to provide students with more person¬ 
alized learning opportunities than learning environments based in traditional 
print materials can support (U.S. Department of Education, 2017a). 

Personalization depends on an adequate supply of high-quality digi¬ 
tal learning resource—there must be enough resources to meet a range of 
learning tasks and needs and the resources must be effective and accurate. 
Policymakers and educators are concerned about an adequate supply of af¬ 
fordable, readily available, high quality learning resources (Lagoze, 2010; 
McMartin et al., 2008; Tonks, Patrick, & Bliss, 2013), particularly in the ar¬ 
eas of science, technology, mathematics, and engineering (STEM) learning 
(Porcello & Hsi, 2013). To determine resource quality and appropriateness, 
educators may need to rely on more than resource descriptions (i.e., meta¬ 
data) to judge suitability. Descriptive metadata have been a primary means 
to support OER selection and implementation (Abramovich & Schunn, 
2012). While descriptive metadata may help to determine what a resource 
is about, educators also benefit from information about how other educators 
have used the resource and whether they found it to be to be effective (U.S. 
Department of Education, 2012, 2013). 

Metadata are data about the data; paradata are interaction data, that is, 
the data around the data (MacNeill, Campbell, & Hawksey, 2014). Learn¬ 
ing analytics practitioners have the potential to use these data generated 
by the educational resource repository users to recognize patterns in digi¬ 
tal resource use and impact (Abramovich, Schunn, & Correnti, 2013 ). Be¬ 
cause paradata include system generated and social contributions such as 
ratings, reviews, comments, favorites, shares, downloads, and other activity 
data relating to resource use, their analysis can give insight into educators’ 
and learners’ perceptions and preferences (Campbell & Barker, 2013; U.S. 
Department of Education, 2012). 
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Complementary work has been done to assess OER quality from metadata 
based on expert reviews (e.g., Bethard, Wetzler, Butcher, Martin, & Sumner, 
2009); however, little work has been done to determine ways in which us¬ 
er-contributed, and user-generated paradata might reflect perceptions of re¬ 
source quality and use. 

A popular way of deriving meaning from free text user-provided feed¬ 
back is opinion mining via sentiment analysis. Sentiment analysis has grown 
out the need for providers like shopping site Amazon.com (Liu, Yu, An, & 
Huang, 2013), travel planning site Trip Advisor (Lak & Turetken, 2014), and 
restaurant review site Yelp (Ganu, Kakodkar, & Marian, 2013; Seaman & Al¬ 
len, 2014) to be able to be able to recommend products and services to their 
users. This emerging area of text mining and computational linguistics may 
also provide a useful, but underexplored, approach to examining learning re¬ 
source usage data. 

Purpose and Research Questions 

The purpose of this study is to explore the relationship between paradata 
elements and user perceptions of resource quality. To meet this purpose, we 
addressed the following research questions (RQ): 

RQ1. To what extent are users’ assessments of resource quality positive, 
negative, or neutral? 

RQ2. What are the relationships between user sentiments and other learn¬ 
ing resource usage data? 

In this study, we first examined user reviews OER paradata with sentiment 
analysis to explore how useful affective feedback may be for developing and 
curating collections of quality assured web-based resources useful for teach¬ 
ing and learning. Then, we attempted to determine the possible relationships 
between sentiment and other user activities like sharing and favoriting. 

Whether encountered in a fully online or blended learning environment, 
OER can generate data that can allow for tailoring of resources to learners’ 
needs and determining resources’ usefulness, thus closing the loop between 
what is taught and what is learned (Essa, 2016). Once a problem of inter¬ 
est only to researchers interested in learning analytics, the need to understand 
which resources will help learners grasp concepts most effectively and equal¬ 
ize learning opportunities is a national imperative and an issue of social jus¬ 
tice (U.S. Department of Education, 2017a). 


LITERATURE FOUNDATION 

Teaching and learning with digital resources can engage, challenge, and 
motivate students with compelling experiences that encourage independent 
and collaborative learning. A personalized mix of aesthetic, technical, and 
educational resource designs enhances learning engagement and motivation. 
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When learners have the ability to use and create digital resources to extend 
their repertoires, they can realize learning outcomes; support their agency; 
develop metacognitive skills and higher order thinking skills; and partici¬ 
pate in reflection and collaboration (Chen & Sun, 2012; Project Tomorrow, 
2012a). While effective teaching is an important element in fostering learn¬ 
er outcomes, in the immediate future, so is the availability of high-quality, 
engaging, personalized learning content (Hanover Research Council, 2011; 
Maull, Saldivar, & Sumner, 2010; New Media Consortium [NMC], 2014). 

Learning personalization results from data-informed decisions made at 
key points in the instructional process: determination of teacher and stu¬ 
dent characteristics; guidance of a coherent, rigorous curriculum; execu¬ 
tion of appropriate student-centered teaching activities; and application of 
meaningful formative assessments (U.S. Department of Education, 2012). 
Whether in an elementary or secondary online course, blended learning 
environment, or instructional improvement system, this process rests on 
a foundation of high quality, plentiful, multimodal OER (Collins & Levy, 
2013; Hewlett Foundation). 

Open Educational Resources 

The Hewlett Foundation has defined OER as: 

[Tjeaching, learning, and research resources that reside in 
the public domain or have been released under an intellectual 
property license that permits their free use and re-purposing 
by others. [OER] include full courses, course materials, mod¬ 
ules, textbooks, streaming videos, tests, software, and any 
other tools, materials, or techniques used to support access to 
knowledge, (para. 3) 

This definition indicates that OER may be used in a variety of learning 
settings, including those that are entirely face-to-face, those that are entirely 
online, or those that represent a combination or blend of the two. 

OER originated in and are often created for K-12 STEM learning (Han¬ 
son & Carlson, 2005; Mardis, 2003; Mardis & Howe, 2010; Mardis & Zia, 
2003). While the benefits of using OER versus costly commercial instruc¬ 
tional materials are evident, problems of determining OER quality have 
persisted for over two decades (Hewlett Foundation, 2013; Leutkemeyer 
& Mardis, 2016; Okerson, 2000; Project Tomorrow & Blackboard, 2016). 
In an unpublished report of a 2013 meeting of statewide education officials 
hosted by the National Science Digital Library (http://nsdl.org), participants 
stated that the operationalization on their digital learning initiatives relied 
on an adequate supply of high-quality K-12 STEM OER. Participants cited 
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threats such as a lack of curation strategies to manage collections; insuf¬ 
ficient resources to identify and vet OER; and a strong desire to automate 
the selection, validation, and management processes as essential issues to 
resolve if their delivery of curricula based on the Common Core State Stan¬ 
dards (CCSS) and Next Generation Science Standards (NGSS) were to be 
successful (National Science Digital Library [NSDL], 2013). 

Many researchers have noted that descriptive metadata standards are 
neither necessary nor sufficient to address issues to expressing learning re¬ 
source quality to users (Bethard et al., 2009; Porcello & Hsi, 2013; Wetzler 
et al., 2013) because aspects of OER quality often include resource curation 
issues that are not captured in descriptive metadata schema. OER curation is 
discussed in the following section. 

OER Curation 

Resource selection, management, and promotion comprise the curatorial 
enterprise (Rosenbaum, 2011). Researchers have demonstrated that teachers 
have tended to prefer learning materials that have been reviewed by experts 
and used with positive effect by educators in nearby and in similar circum¬ 
stances (Abramovich & Schunn, 2012; Williams & Coles, 2003). In recent 
years there has been a growing awareness that usage data, such as ratings 
and user reviews, are needed to properly curate the content in repositories 
and determine resource suitability for teaching and learning tasks. As Grif¬ 
fin (2013) pointed out, “Subjective reviews submitted from highly qualified 
educators as well as independent reviewers are valuable elements that could 
be, and should be, included as unique data...” (para. 3). Features that capture 
other aspects of the user experience include incidences of favoriting, shar¬ 
ing, viewing, and downloading (Campbell & Barker, 2013). 

Traditional instructional materials are primarily reviewed and selected 
for content accuracy, particularly in the STEM fields (Spiegel, 1989; Stern 
& Roseman, 2004). As materials have become more digital and curricu¬ 
lum standards more influential, media quality and standards alignment is¬ 
sues must also be routinely considered (Hanson & Carlson, 2005; Mardis, 
ElBasri, Norton, & Newsum, 2012). However, in an autonomous search 
mode, when teachers assess quality as they collect resources for immedi¬ 
ate use, sentiment and quality determination tend to vary widely (Perrault, 
2007; Recker et al., 2011; Recker et al., 2007). Some teachers favorite as¬ 
sessments based on the resource’s appearance (e.g., colors, font, web page 
format) and some prefer assessments based on content (e.g., number and 
currency of citations, content provider affiliation) (Price, 2007). For this 
reason, many teachers and education policymakers have expressed prefer¬ 
ences for repositories of vetted resources with clear designations of qual¬ 
ity, utility, and curriculum support (Griffin, 2013; Project Tomorrow & 
Blackboard, 2016; Sumner, Khoo, Recker, & Marlino, 2003). 
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However, encouraging teachers to contribute plentiful, complete, and ac¬ 
curate reviews of resources has proven to be difficult, despite the fact that 
many teachers have responded favorably to complete reviews left by other 
educators (Abramovich et al., 2013). Incentives are rarely effective because 
many teachers view feedback about their resource preferences as personal 
criticism. Also, teachers tend to be altruistically motivated and enthusiastic 
about sharing feedback when they feel the resource is valuable and would 
help other teachers (Van Acker, van Buuren, Kreijns, & Vermeulen, 2013). 
To gain insight into teachers’ feedback behaviors, we looked to studies of 
contributions from other domains, including online news readers, shoppers, 
and service consumers. 

Usage Data in Other Domains 

As the number of websites that provide informational and commercial 
resources has grown, so has the availability of tools that capture user feed¬ 
back in usage data. Like educational usage data, these data exist in two 
forms: 1) subjective usage data such as user reviews, comments, annota¬ 
tions, and recommendations; and 2) objective usage data that are system 
generated like numbers of ratings, views, downloads, and shares. In this 
section, we explore not only users’ motivations to provide feedback, but 
also the types of feedback consumer-users tend to contribute. 

Motivation to Contribute Ratings and Reviews 

In a longitudinal study, Tenenboim and Cohen (2015) examined the as¬ 
sociation between users’ clicking and commenting activities for online news 
items, as well as differences and similarities between items with a high 
number of views and items with a high number of comments. Significant 
differences were found between the two groups of items, especially regard¬ 
ing the news topics and elements that may arouse controversy or curiosity. 
Recently, researchers (Ziegele, Breiner, & Quiring, 2014) used mixed meth¬ 
ods to study news story user commenting behavior, the results of which im¬ 
plied that commenters could trigger further response by including contro¬ 
versial or unexpected statements, personalizing comments, and expressing 
conflicting opinions in their postings and by avoiding incomprehensibility 
and negativity. Length, position, the news medium itself, and the news story 
topic further affected the probability of whether a comment received feed¬ 
back. In another longitudinal study, Springer, Pfaffinger, and Engelmann 
(2015) investigated users’ motives and inhibitors for commenting on news 
websites. When comments were visible and accessible beneath their respec¬ 
tive articles, users reported deep engagement with other users’ viewpoints, 
and suggestions for related stories. 
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Consumer Ratings and Reviews 

Many content providers allow users to contribute their feedback about 
resources. User reviews, comments, and annotations can provide valuable 
information that other users may find helpful, but they must be systemati¬ 
cally analyzed and integrated into a content provider’s recommender sys¬ 
tem to have maximum effect. Mumdabi and Schiff (2010) analyzed Ama¬ 
zon.com reviews to determine whether sentiment extremity, review depth, 
and product type affected the perceived helpfulness of the review. They 
found that reviews with extreme ratings were perceived as less helpful than 
reviews with moderate ratings. Review depth had a positive statistical cor¬ 
relation with the perceived helpfulness of the review. The researchers also 
reported that Amazon.com reviews which contained positive and negative 
comments and moderate ratings (for example, three out of five stars) were 
perceived as most helpful. These findings were echoed by subsequent re¬ 
searchers working with ratings and reviews in the consumer and leisure do¬ 
mains (e.g., Kronrod & Danziger, 2013; Ludwig et al., 2013). 

Recent research has also revealed possible problems with written user re¬ 
views. For example, while consumer research firm YouGov (2014) reported 
that most Americans used consumer reviews and that almost half who used 
them also contributed reviews, users were not likely to leave negative re¬ 
views unless they had a very bad experience with the product. Instead, most 
Americans opted not to leave negative reviews to avoid being critical. You¬ 
Gov also reported that nearly a fifth of its adult survey respondents contrib¬ 
uted false negative reviews for malicious or spiteful reasons. 

Because the study of usage data in non-education domains is mature, 
this prior work may offer insight into phenomena that may be uncovered 
through the study of educational usage data. While there may be ways to 
increase the extent to which educators contribute reviews and ratings of 
learning resources, educational paradata may also be affected by issues 
observed in other domains: the difference between expert and community 
concepts and expressions of quality, difficulty in controlling review qual¬ 
ity, and an absence of means for quality to be automatically characterized to 
enhance personalization (Massart & Shulman, 2013). Without further study, 
these issues will likely also remain barriers to scaling OER use. 

METHOD 

In this study, we analyzed usage data from resources in the National Sci¬ 
ence Foundation’s (NSF) National Science Digital Library (NSDL), a lead¬ 
ing provider of K-12 STEM-focused OER. NSDL housed OER metadata 
from nearly 400 NSF-funded content providers. NSDL was at the forefront 
of envisioning OER metadata integrated with usage data, and coined the 
term paradata in its proposed research agenda in 2010 (VanGundy, 2010a). 
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Description of the Data 

NSDL began providing usage data for resources in its collections in Jan¬ 
uary 2011 in the form of annotations about the resources and related infor¬ 
mation on usage (VanGundy, 2010b). NSF discontinued NSDL funding, and 
new data sets were no longer available after December 2013. To date, no 
other K-12 OER paradata sets are publicly available. In this study, we used 
NSDL paradata from January 2011-December 2013. NSDL paradata con¬ 
sists of usage data provided by the OER content providers listed in Table 1. 

NSDL has also contributed two distinct usage data schema (comm anno 
and comm_para) (NSDL, 2014a, 2014b; 2014c), which are considered 
among preferred paradata formats by many OER providers (Bienkowski, 
Brecht, & Klo, 2012; Campbell & Barker, 2013; Niemann, Scheffel, & Wol- 
pers, 2012; U.S. Department of Education, 2014). The “comm anno” XML 
sets contain review text, and “comm_para” XML sets contain non-textual 
usage data such as star ratings, favorites, and downloads. Table 1 lists the 
collections that provided these types of paradata to NSDL. 

Table 1 

NSDL Paradata Providers, Domains, and Learner Levels 


Data Provider 

STEM Domain 

Level 

CLEAN 

Climate science education 

All 

ComPADRE 

Physics and astronomy education 

All 

CTE Online 

Career and technical education, 

K-12 

DLESE 

Earth systems education 

All 

iCPALMS 

All STEM education 

K-12 

Instructional Architect 

All STEM education 

K-12 

PBS Learning Media 

All STEM education 

All 

PRISMS 

All STEM education 

All 

SMILE Pathway 

Science and mathematics informal education 

All 

TeachEngineering 

Engineering education 

K-12 

TeachSpatial 

Spatial cognition, learning, and literacy education 

All 


The researchers harvested the XML usage data files directly from NSDL 
servers and then combined the comm_anno and comm_para files to inte¬ 
grate all available usage data for each resource. In sum, we extracted a total 
of 2,505 comm anno and comm_para resource records from NSDL. Table 2 
provides an overview of the usage data included in the integrated file. 
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Table 2 

Types of Usage Data Present in NSDL Paradata Records (N=2505) 


Usage data Type 

Occurrence 

Annotations (subjective) 

752 

Downloads (objective) 

17 

Favorites (subjective) 

499 

Features (subjective) 

101 

Ratings (subjective) 

519 

Recommended (subjective) 

113 

Views (objective) 

504 

Total 

2505 


As Table 2 shows, the sample included 752 comments. Records also includ¬ 
ed 1232 other occurrences of user-provided (subjective) data in the form of 
favorites, features, ratings, and recommendations as well as 521 system¬ 
generated (objective) contributions in the form of downloads and views, for 
a total of 2505 discrete usage data elements. 

We used the Simple API for XML (SAX) Java API1 to parse the XML 
documents. The parser extracted data for usage data specific fields (such as 
user comments, views, and downloads in our case) and excluded content 
from other fields. The parsed usage data were redirected to an Excel spread¬ 
sheet using the Poor Obfuscation Implementation (POI) API. 

Procedure 

We performed our analysis in two phases to establish sentiment polarity 
(positive or negative) and its relationship to subjective rating (e.g., “two and 
a half stars”) and system-generated usage data. 

Phase I. Sentiment Analysis 

We used sentiment analysis to determine whether user reviews and com¬ 
ments were positive, negative, and neutral and assigned a numerical weight 
according to each determination. As the base for our sentiment analysis, we 
used commonly accepted techniques (Padmaja & Fatima, 2013) including 
the Natural Language Toolkit (NLTK), an external library for Python, be¬ 
cause it can perform a vast amount of text processing and analysis. We used 
an NLTK-trained text processing classifier derived from the data set cre¬ 
ated by Pang and Lee (2005). This classifier contains data from movie and 
Twitter reviews in which the content of reviews is categorized as posi¬ 
tive, negative, and neutral based; these sentiments were used as the basis 
for comparison. This classification approach is the most widely accepted 


1 Available from http://docs.oracle.eom/javase/7/docs/api/javax/xml/parsers/SAXParserFactory.html 
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approach to classifying natural language in sentiment analysis (Piryani, 
Madhavi, & Singh, 2017). The classification process proceeded as follows: 

1. Initially “stop words” (e.g., a, and, the) were filtered from user com¬ 
ments, by importing a corpus of stop words from NLTK library. 

2. Featured words were then extracted from the comments into a list in 
which every word was ordered by the number of times it appeared. 

3. The classifier then created a “feature extractor” to select the relevant 
featured words. The feature extractor returns a list of tuples where 
each tuple contained a dictionary entry and featured sentiment for each 
comment. 

4. A Naive Bayes Classified was then used to classify of comments. It 
uses the probability of each label that is the frequency in the training 
set and the support from each featured word. 

5. The trained classifier was then used to classify user comments and 
assign a sentiment output number. Possible sentiments range from -1 
Fully Negative to 1 Fully Positive. Zero (0) is considered neutral. 

For resources that had multiple reviews, we calculated an average (i.e., 
arithmetic mean) of sentiment classification scores. Once the sentiment 
analysis scores were assigned, both researchers independently reviewed 
random selections of the polarity assignments and achieved approximately 
90% agreement with the machine assignment. 

Phase II: Bivariate Correlation 

Using Pearson product-moment correlation analysis, the annotation sen¬ 
timent average values (N=501) were first correlated with subjective usage 
data (favorites, features, ratings, and recommendations), then with objective 
usage data (downloads, views). The researchers chose to use Pearson cor¬ 
relational approach because it is well suited to exploring correlational re¬ 
lationships (Green & Salkind, 2005). Finally, the subjective and objective 
usage data were correlated absent the average sentiment values. All corre¬ 
lation analyses were conducted using the Statistical Package for the Social 
Science (SPSS). 

Validity Concerns 

The limitation with sentiment analysis is that machine learning for this 
kind of data set is relatively new and has yet to undergo further research and 
testing to achieve 100% accuracy. Subtle language and domain-specific lan¬ 
guage can affect sentiment accuracy. As Pang and Lee (2008, p. 21) pointed 
out: 


2 Available from http://nltk.googlecode.com/svn/trunk/doc/api/nltk.classify.naivebayes.NaiveBayes- 
Classifier-class.html 
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Sentiment and subjectivity are quite context-sensitive, and, at 
a coarser granularity, quite domain dependent (in spite of the 
fact that the general notion of positive and negative opinions 
is consistent across different domains). Note that although do¬ 
main dependency is in part a consequence of changes in vo¬ 
cabulary, even the same expression can indicate different sen¬ 
timent in different domains. 

Neutral reviews required internal parsing to determine polarity. There are 
certain user comments like “5 star” or “cool” and some comments that be¬ 
gin with strong positive sentiment and ends with weak negative sentiment, 
for which emotion calculated is neutral, although the review sentiments are 
considered positive. After consulting research by experts in sentiment anal¬ 
ysis on how to address this issue (Wilson, Wiebe, & Hoffmann, 2009), we 
determined that manual review was needed for the analysis of such com¬ 
ments to determine true expressed sentiment and reclassify them accord¬ 
ingly. 

Finally, it should be noted that the prior research done in this area sug¬ 
gests that automated sentiment analysis works best with big datasets and 
cannot be relied upon for smaller data samples (Pang & Lee, 2008). De¬ 
spite these potential challenges, each researcher checked different random 
samples of the polarity assignments and found the sentiment analysis to be 
90% accurate. 


RESULTS 

The NSDL file reflected the textual and non-textual usage data collected 
for 2,505 records. Of the records in that file, 757 had user-contributed an¬ 
notations, i.e., comments or reviews. These reviews contained 13,353 words 
of resource review text, with an average of 18 words per annotation and ap¬ 
proximately two sentences per annotation. Figure 1 depicts the distribution 
of comments across records. 

Few records had two comments (n=135). The slope continues downward 
with 60 records having three comments, 28 records having a fourth com¬ 
ment, 15 records with five, nine records having six comments, six records 
having seven comments, two records having eight comments, and only one 
record with nine comments. 

The 757 annotation records frequently included “video” (n=376), “stu¬ 
dents” (n=370), “great” (n=362), “use” (n=354), “good” (n=345), “Excel¬ 
lent” (n=337), “lesson” (n=328), “class” (n=320), “love” (n=312), and “in¬ 
formation” (n—304) were the most frequently used terms in resource anno¬ 
tations. These frequencies suggested what we would find in the sentiment 
analysis. 
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Figure 1. Distribution of annotation entries per record (N=757). 


Phase 1. Sentiment Analysis Results 

The first step of the sentiment analysis was to examine all of the annota¬ 
tion entries and determine their polarity. Then, we divided the range of sen¬ 
timent scores into three equal ranges: negative, neutral, and positive. Figure 
3 provides an overview of the distribution of the annotations across polarity. 



Figure 2. Distribution of annotation polarity (N=757). 
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As Figure 3 depicts, the data set included 135 neutral annotations with 
sentiment scores ranging from -.147 to .460, with a mean of .042 and a me¬ 
dian of .000. The 49 negative annotations had sentiment scores ranging 
from -.754 to -. 146, with a mean of -.364 and a median of -.404. Most of the 
annotations were positive (n=579). Positive sentiment scores ranged from 
.461 to 1.07, with a mean of .508 and a median of .500. 

For resources that had more than one review (n=501), we then calculated 
the arithmetic means of the sentiments for those resources to enable further 
explorations. Figure 3 depicts the distribution of average annotation senti¬ 
ment means relative to a normal curve. 



Mean = .3623424038 
Std. Dev. = .2791595380 
N =501 


Average Sentiment 


Figure 3. Distribution of average annotation sentiment scores (N=501). 

As Figure 3 shows, the distribution of the average annotation sentiment 
somewhat fits the normal curve, with a skewness of -.956 and a kurtosis of 
1.064. Negative values for the skewness indicate data that are skewed left, 
with the left tail is long relative to the right tail. A positive kurtosis indicates 
a “peaked” distribution, with few extremely high values. 

Next, we plotted the individual sentiment scores versus the average sen¬ 
timent scores, as Figure 4 depicts. 
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Figure 4. Plots of individual annotation sentiment scores and average senti¬ 
ment scores. 

When the average annotation sentiments were plotted against the indi¬ 
vidual sentiment scores, the results showed that, while the averaging muted 
very high and very low values, the majority of the average scores remained 
clustered similar to the pattern of the individual scores, thus suggesting that 
they would be viable for use in correlational analyses to detect linear rela¬ 
tionships. 

Phase 2. Bivariate Correlations 

Next, we calculated Pearson Product Moment Correlation coefficients 
among the six non-textual system-generated usage data and average sen¬ 
timent scores. Using the Bonferroni approach to control for Type I error 
across the bivariate correlations, a p value of less that .01 was required for 
significance (Green & Salkind, 2005). Table 3 depicts the results. 











Average Sentiment Sig. (2-tailed) 
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Usage data Element Downloaded Favorited Featured Rating Recommended Viewed Average Sentiment 
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The results of the analyses presented in Table 3 show that four correla¬ 
tions were statistically significant. The most significant correlation was be¬ 
tween Featured (i.e., the number of times the resource was a featured re¬ 
source on the website) and Downloaded at p= 1.00. Other significant corre¬ 
lations were between Viewed and Favorited (p=. 384) as well as Viewed and 
Featured (p=305.) and the Average Sentiment and Rating (p=158). 

Although not statistically significant, negative correlations were ob¬ 
served between Viewed and Downloaded (p=-.372), Featured and Favorited 
(p=-.285), and Rating and Favorited (p=-.059). Average Sentiment and Rat¬ 
ing demonstrated very few notable relationships with objective or subjective 
usage data elements. The findings, when compared to the literature founda¬ 
tion, provided points for discussion about the research questions, which are 
discussed in the following section. 

DISCUSSION 

In this section, we examine research questions in light of the presented 
literature and the research questions. 

RQ1. To what extent are users’ assessments of resource quality positive, 
negative, or neutral? 

Pang and Lee (2005) established standards for positive, negative, and 
neutral sentiments in neutral contributions; we used these standards to clas¬ 
sify contributors’ resource reviews. The literature relating to user reviews 
suggests that although users most frequently leave comments that express 
extremely positive or extremely negative sentiment, comment readers and 
users value comments that include both positive and negative assessment 
elements. NSDL commenters tended to use a balanced approach because 
the sentiments primarily contained resource description words and sec¬ 
ondarily contained resource use words. Most reviews appeared to be about 
the format of the resource, i.e., “lessons” and “videos.” The fewest annota¬ 
tion words related to the content of resources such as “water” and “Flemi- 
sphere.” 

Our usage data analyses suggested that while users tended to leave 
positive comments more frequently, those positive comments tended to be 
weakly positive, weakly negative, or neutral. These kinds of assessments 
suggest that NSDL collection users, presumably educators, blend positive 
and negative elements in their annotations, along the lines of the “feedback 
sandwich” form of teacher feedback in which a negative comment in placed 
between positive comments (Milan, Parish, & Reichgott, 2006; Parkes, Ab¬ 
ercrombie, & McCarty, 2013) or the modulated language that consumers 
tend to employ when reviewing a product they have experienced (Kronrod 


Usage Data 


213 


& Danziger, 2013; Mudambi & Schuff, 2010). Unlike to content of con¬ 
sumer reviews, very few of the annotations reflected a very high sentiment 
score, which few readers find helpful anyway (Ludwig et al., 2013). 

RQ2. What is the relationship between sentiment assessments and resource 
use? 

Bivariate correlational analysis revealed very weak or no correlations 
between annotation sentiment and indicators of resource use, such as inci¬ 
dences of downloading, viewing, or favoriting. This finding is in contrast 
with prior studies in which researchers reported a relationship between star 
ratings and downloads (Abramovich et al., 2013). However, the relationship 
between Featured and Downloaded was significant and strong, suggesting 
that when a collection provider promotes a resource, additional attention re¬ 
sults in a willingness to use the resource. The power of promotion may also 
account for the significant relationship between Featured and Viewed. 

The result that Viewed and Favorited had a significant correlation may be 
symptomatic of a Facebook-like behavior in which a user reviews quickly 
and likes (or, here, favorites) any items that seem to appeal (Gerlitz & Hel- 
mond, 2013; Sharifrazi & McCabe, 2014). It is difficult to determine the ex¬ 
tent to which this relationship proxies an assessment of the resource’s qual¬ 
ity or usefulness. However, that this relationship is more significant than the 
relationship between Views and Average Sentiment may be telling of a new 
type of user feedback preference. From a curation perspective, it may be im¬ 
portant for collection providers to consider targeted promotion of their col¬ 
lections to ensure broad use. 


CONCLUSION 

Driven by the common standards movement and federal education ac¬ 
countability requirements, education trends toward personalized learning 
are prompting an explosion of interest, and even anxiety, about securing 
a large supply of high quality open educational resources for K-12 learn¬ 
ers (Hanover Research Council, 2012; Hewlett Foundation, 2013; Patrick, 
Worthen, Frost, & Gentz, 2016). Personalization thrives in an environment 
that enables teachers to adapt content in ways that allow students to explore, 
create, and demonstrate their knowledge (de los Arcos, Farrow, Pitt, Weller, 
& Me Andrew, 2016). 

In this study, we took an initial look at the extent to which subjective 
and objective learning resource usage data from the National Science Digi¬ 
tal Library, a leading K-12 OER provider, can be used alone to determine 
resource usage, utility, and quality. 
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Overall, the sentiment analysis revealed only initial insight into users’ re¬ 
source quality perceptions. Reviews tended to be short with brief, declara¬ 
tive sentences that reflected a narrow range of sentiment. Bivariate correla¬ 
tions suggested that more could be learned from the relationship between 
non-textual usage data than from the relationship between sentiment text 
and non-textual usage data elements. These possible limitations may be due 
to sentiment analysis’ underappreciated usefulness for predictions (Gayo- 
Avello, 2012) or NSDL’s modest educational usage data corpus (Lak & 
Turetken, 2014). However, another possible interpretation of this study’s 
findings is that sentiment analysis of educational usage data is not yet useful 
to detect valence or affect, and sentiment analysis alone may never be suf¬ 
ficient. This conclusion aligns with Gerlitz and Helmond’s (2013) point that 
user engagement on the web is increasingly consists of quick, easy, measur¬ 
able button-generated interactions such as ratings and "likes.” Despite this 
uncertainty, the researchers sense that this type of research will yield much 
more interesting and helpful results as the usage data corpus grow in size 
and quality. Nonetheless, this study did produce some recommendations and 
directions for further research. 

Limitations, Recommendations, and Directions for Further Research 

We acknowledge several limitations to this study should be acknowl¬ 
edged and can be addressed with recommendations for content providers. 
These areas provide suggested directions for further research. 

1. Beware of “junk” annotations and misspellings. The data set con¬ 
tained annotations such as "That was crazy Jeffrey wrote this,” or “b,” 
or “I really love this sight.” Content providers can address this issue 
by providing a review scaffold with character limits and spell check 
features. Researchers should be sure to check for misspellings and in¬ 
appropriate reviews and flag the annotations for further review. 

2. Be attuned to subtle language and overly declarative annotations. 

Because research has suggested that users tend to soften their lan¬ 
guage when writing reviews, sentiment analysis may underrepresent 
subtle reactions such as “so educational” and “thank you for making 
this video.” Likewise, sentiment analysis may over weigh hyperbolic 
comments such as "this is the coolest thing I’ve ever seen.” These is¬ 
sues are especially problematic in the analysis of a mixed sentiment 
comment such as “Very simple. I used as a homework assignment. 
The students wanted more variety and more challenge, though.” Con¬ 
tent providers can scaffold with annotation prompts instead of free 
text boxes that will guide users’ contributions with specific ques¬ 
tions. Scaffolds can also prompt for comments relating to content and 
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teaching strategies. Content providers can also employ annotation-vot¬ 
ing features to allow users to mark reviews they find helpful. 

Researchers can improve and expand the dictionaries they use to detect 
sentiment and use word, phrase, and concept level analyses (Cambria, 
Schuller, Liu, Wang, & Havasi, 2013). Researchers can also consider 
a discipline-specific dictionary that reflects aspects of resource quality 
drawn from instructional materials selection and teaching literature. 

3. Factor in the user’s identity. User profiles were not part of this study 
and user job role, experience level, and work site may have a relation¬ 
ship with review content. Collection providers may wish to consider 
gathering minimal user profile infonnation to gain a sense of the con¬ 
tributor. Researchers may explore differences between collections that 
contain discipline expert annotations and collections with teacher an¬ 
notations as well as compare expert and teacher annotation ratings, 
downloads, and other non-textual usage data. Expert input shows great 
promise as a basis for machine learning techniques to automatically 
assess resource quality (Bethard et al., 2009; Wetzler et al., 2013). 
The extent to which sentiment analysis can factor into improving or 
extending automatic assessment is a fertile area for exploration. 

4. Factor in the context of contribution. Many reviews appear to have 
been left during a professional development event or content provider 
presentation. To the extent possible, collection providers may wish to 
capture the context in which the review is contributed, perhaps with 
a checkbox or radio button selection in the annotation authoring dia¬ 
logue. Researchers may analyze reviews for indicators of application 
such as “Students examined this video in small groups...” 

The power of usage data to inform personalized learning ecosystems 
may not be ready to be fully realized, but as the field of learning analytics 
continues to grow as a result of, or along with, the rising use of OER and 
their social media features, this type of large-scale educational data analysis 
offers teachers, students, and researchers an unprecedented opportunity to 
better understand the role of instructional resources in learning and teach¬ 
ing. 
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